Search CORE

65 research outputs found

Topology independent protein structural alignment

Author: Binkowski TA
DasGupta Bhaskar
Dundas Joe
Liang Jie
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract. Protein structural alignment is an indispensable tool used for many different studies in bioinformatics. Most structural alignment algorithms assume that the structural units of two similar proteins will align sequentially. This assumption may not be true for all similar proteins and as a result, proteins with similar structure but with permuted sequence arrangement are often missed. We present a solution to the problem based on an approximation algorithm that finds a sequenceorder independent structural alignment that is close to optimal. We first exhaustively fragment two proteins and calculate a novel similarity score between all possible aligned fragment pairs. We treat each aligned fragment pair as a vertex on a graph. Vertices are connected by an edge if there are intra residue sequence conflicts. We regard the realignment of the fragment pairs as a special case of the maximum-weight independent set problem and solve this computationally intensive problem approximately by iteratively solving relaxations of an appropriate integer programming formulation. The resulting structural alignment is sequence order independent. Our method is insensitive to gaps, insertions/deletions, and circular permutations.

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

FLORA: a novel method to predict protein function from structure in diverse superfamilies

Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

Ligand binding site superposition and comparison based on Atomic Property Fields: identification of distant homologues, convergent evolution and PDB-wide clustering of binding sites

Author: A Brakoulias
A Marchler-Bauer
AV Grigoryan
CD Michener
D Giganti
E Kellenberger
F Ferre
J An
K Yeturu
L Lo Conte
M Baroni
M Jambon
M Pastor
M Totrov
M Totrov
MA Kastenholz
Maxim Totrov
ND Gold
PJ Goodford
R Abagyan
R Abagyan
R Abagyan
R Powers
RA Bauer
RP Sheridan
S Schmitt
TA Binkowski
V Campagna-Slater
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

A new binding site comparison algorithm using optimal superposition of the continuous pharmacophoric property distributions is reported. The method demonstrates high sensitivity in discovering both, distantly homologous and convergent binding sites. Good quality of superposition is also observed on multiple examples. Using the new approach, a measure of site similarity is derived and applied to clustering of ligand binding pockets in PDB

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Isolation and in silico characterization of novel esterase gene with β-lactamase fold isolated from metagenome of north western Himalayas

Author: A Knietsch
A Wiseman
AC Wallace
AK Sudan
C Schmeisser
D Perez
DE Gillespie
EI Petersen
EY Yu
J Foght
J Vakhlu
J Vidya
JA Fuhrman
JD Bendtsen
JL Arpigny
K Liebeton
K Rashamuse
K Rashamuse
M Wiederstein
MA Larkin
ML Verdonk
N Eswar
N Mokoena
P Mullany
PF Gherardini
R Berlemont
R Daniel
RA Laskoswski
S Biver
T Morrohoshi
TA Binkowski
UG Wagner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties

Author: A Andreeva
A Gutteridge
AH Elcock
AR Panchenko
B Lee
B Rost
BW Mathews
CA Innis
Cathy H Wu
CH Wu
DK Smith
GJ Bartlett
H Yao
HM Berman
IH Witten
JC Platt
JD Thompson
JS Milton
K Kinoshita
K Sjolander
M Ota
MA Hearst
MJ Ondrechen
Natalia V Petrova
O Lichtarge
P Aloy
PP Wangikar
R Kohavi
R Koradi
R Landgraf
RL Tatusov
S Chakravarty
S Jones
S Parthasarathy
S Zhu
SF Altschul
SJ Campbell
SJ Hubbard
TA Binkowski
W Kabsch
W Tian
WSJ Valdar
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. RESULTS: To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. CONCLUSION: The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties

Author: A Gutteridge
A Shulman-Peleg
AH Elcock
AP Bradley
C Enroth
CT Porter
D Ming
E Youn
F Glaser
F Wilcoxon
G Amitai
G Cheng
GJ Bartlett
J Ko
J Liang
JD Madura
L Xie
Leonel F. Murga
LF Murga
M Ota
M Silberstein
Mary Jo Ondrechen
Michael Levitt
MJ Best
MJ Ondrechen
MK Gilson
N Petrova
P Domingos
R Edgar
R Greaves
RA Laskowski
RA Laskowski
Ronald J. Williams
T Robertson
TA Binkowski
W Tong
W Tong
Wenxu Tong
Y Wei
Y Wei
Ying Wei
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Accuracy of Protein-Protein Binding Sites in High-Throughput Template-Based Modeling

Author: A Sali
A Sivasubramanian
A Tovchigrechko
B Ma
B Pils
B Wallner
D Devos
D Petrey
D Vitkup
ERS Kunji
F Alber
I Bartova
I Friedberg
IA Vakser
IA Vakser
Ilya A. Vakser
J Hunjan
J Janin
J Moult
JH Han
M Brylinski
M Brylinski
M Sikic
MF Lensink
NY Forlemu
O Keskin
P Aloy
P Aloy
P Aloy
P Kundrotas
P Lijnzaad
Petras J. Kundrotas
PJ Kundrotas
PJ Kundrotas
R Mosca
R Mosca
RB Russell
RB Russell
Ruth Nussinov
S Henikoff
S Mika
SF Altschul
TA Binkowski
Y Gao
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 Å, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 Å<RMSD<10 Å, the accuracy suitable for less sensitive structure-alignment methods. Overall, ∼50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server

Author: A Andreeva
A Bateman
A Godzik
A Gutteridge
A Kouranov
A Shulman-Peleg
AC Wallace
AG Murzin
BE Engelhardt
Bing Xiong
C Chothia
CA Orengo
D Lee
D Pal
David L Burk
DF Veber
GJ Kleywegt
GP Brady
HM Berman
Hualiang Jiang
J Blaszczyk
J Soding
Jie Wu
Jingkang Shen
K Lundstrom
K Yeturu
L Holm
L Xie
M Ashburner
M Brylinski
Mengzhu Xue
MP Liang
ND Gold
OC Redfern
P Willett
RA Laskowski
RA Laskowski
RB Russell
S Schmitt
SF Altschul
SG Buchanan
T Fawcett
T Hamelryck
TA Binkowski
WA Warr
WR Pearson
XY Jiang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology. Results Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function), which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins. Conclusions This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server

Author: Bing Xiong
Jie Wu
David L Burk
Mengzhu Xue
Hualiang Jiang
Jingkang Shen
WA Warr
A Kouranov
A Godzik
OC Redfern
SG Buchanan
K Lundstrom
DF Veber
D Lee
SF Altschul
A Bateman
BE Engelhardt
J Soding
C Chothia
L Holm
AG Murzin
CA Orengo
A Andreeva
TA Binkowski
GJ Kleywegt
RA Laskowski
RB Russell
S Schmitt
A Shulman-Peleg
AC Wallace
T Hamelryck
M Ashburner
P Willett
HM Berman
GP Brady
WR Pearson
A Gutteridge
T Fawcett
ND Gold
J Blaszczyk
K Yeturu
RA Laskowski
L Xie
MP Liang
M Brylinski
XY Jiang
D Pal
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Queen's University Belfast Research Portal

Crossref

Southampton (e-Prints Soton)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

VASP: A Volumetric Analysis of Surface Properties Yields Insights into Protein-Ligand Binding Specificity

Author: A Kahraman
A Nicholls
AS Yang
Barry Honig
BJ Polacco
Brian Y. Chen
BY Chen
BY Chen
CA Orengo
CB Barber
D Petrey
DM Shotton
F Alpy
G Biggiogero
GI Berglund
GP Brady Jr
HB Voelcker
HM Berman
HSM Coxeter
I Botos
I Schechter
IN Shindyalov
J An
J Felsenstein
J Liang
J Schaer
JA Barker
JD Robertus
JE Mogensen
JF Gibrat
Jie Liang
K Kinoshita
K Morihara
KP Peters
L Graf
L Holm
L Xie
LP Clarke
M Nayal
M Rosen
MA Larkin
MJ Romanowski
ML Connolly
MN Nalam
N Kudo
O Pasternak
PH Sneath
PJ Artymiuk
R Chauvin
R Nussinov
RA Laskowski
RB Russell
RS Bohacek
S Schmitt
SL Roderick
T Ju
T Ju
TA Binkowski
TA Steitz
TR Stouch
W Heiden
WE Lorensen
WL DeLano
X Zhang
X Zhang
Y Tsujishita
YJ Im
YY Tseng
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Many algorithms that compare protein structures can reveal similarities that suggest related biological functions, even at great evolutionary distances. Proteins with related function often exhibit differences in binding specificity, but few algorithms identify structural variations that effect specificity. To address this problem, we describe the Volumetric Analysis of Surface Properties (VASP), a novel volumetric analysis tool for the comparison of binding sites in aligned protein structures. VASP uses solid volumes to represent protein shape and the shape of surface cavities, clefts and tunnels that are defined with other methods. Our approach, inspired by techniques from constructive solid geometry, enables the isolation of volumetrically conserved and variable regions within three dimensionally superposed volumes. We applied VASP to compute a comparative volumetric analysis of the ligand binding sites formed by members of the steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domains and the serine proteases. Within both families, VASP isolated individual amino acids that create structural differences between ligand binding cavities that are known to influence differences in binding specificity. Also, VASP isolated cavity subregions that differ between ligand binding cavities which are essential for differences in binding specificity. As such, VASP should prove a valuable tool in the study of protein-ligand binding specificity

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central